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Abstract. Statically reasoning in the presence of and about exceptions 
is challenging: exceptions worsen the well-known mutual recursion be- 
tween data-flow and control-flow analysis. The recent development of 
pushdown control-flow analysis for the A-calculus hints at a way to 
improve analysis of exceptions: a pushdown stack can precisely match 
catches to throws in the same way it matches returns to calls. This work 
generalizes pushdown control-flow analysis to object-oriented programs 
and to exceptions. Pushdown analysis of exceptions improves precision 
over the next best analysis, Bravenboer and Smaragdakis's Doop, by 
orders of magnitude. By then generalizing abstract garbage collection 
to object-oriented programs, we reduce analysis time by half over pure 
pushdown analysis. We evaluate our implementation for Dalvik bytecode 
on standard benchmarks as well as several Android applications. 

1 Introduction 

Exceptions are not exceptional enough. Thrown exceptions — or the possibility 
thereof — pervade the control-flow structure of modern object-oriented programs. 
A static analyzer grappling with Java must concede that even innocent-looking 
expressions like 

x / in. read () 

could throw four exceptions: ArithmeticException (for divide by 0); IDException 
(for reading); NullPointerException (for dereferencing in); and technically 
even MethodNotFoundException (if the read method was removed after this 
file was compiled). 

To make sense of a program, a static analyzer must exploit data-flow informa- 
tion to rule out exceptions (such as NullPointerException and MethodNotFoundException 
in the prior expression). Yet, precise data-flow information requires a precise 
analysis of exceptions. Co-analyzing data- and exception-flow is essential for pre- 
cision. Yet, even then, many exceptions (like IOExceptionor ArithmeticException 
in the prior expression) cannot be ruled out statically. It is critical to precisely 
match catchers to throwers. 



Exception-flow fundamentally follows the structure of the program stack 
at run-time. Because the stack can grow without bound, traditional analysis 
regimes like /c-CFA [27| and its many variants implicitly or explicitly finitize the 
stack during abstraction. In effect, analyzers carve up dynamic return points 
and exception-handling points among a finite number of abstract return con- 
texts. When two dynamic return points map to the same abstract context, the 
analyzer loses the ability to distinguish them. This confusion is a control-flow 
analog of the classic data-flow value merging problem. 

To ground this discussion, consider the following Java fragment: 

try { 

maybeThrowO ; // Call 1 
} catch (Exception e) { 

System. err .println( "Got an exception") ; // Handler 1 

} 

maybeThrowO ; // Call 2 

Under a monovariant abstraction like OCFA [27], where the distinction between 
different invocations of the same procedure are lost, it will seem as though ex- 
ceptions thrown from Call 2 can be caught by Handler 1. 

The fundamental problem with the analysis of exceptions is that the abstract 
program stack is finite. Our message is that pushdown analysis, which does not 
Unitize the program stack, is critical for precise analysis of exception-handling, 
yet it remains computable. Simply switching to pushdown analysis yields orders 
of magnitude improvements in precision over Doop [6], the current state of the 
art exception- flow analysis. 

Spotting an easy opportunity to improve running time, we reduce the state- 
space via abstract garbage collection |17j . We then further improve running time 
and precision by combining abstract garbage collection with live-range analysis. 
In the end, we cut the time cost of pushdown exception-flow analysis by half. 

Our implementation for Java (which targets the Dalvik virtual machine for 
Android) is publicly available: 

https : //github . com/ shuyingliang/pushdownoo 

1.1 Contributions 

We make several contributions: 

1. The first application of the abstracting abstract machines (AAM) method- 
ology [29 to create a static analyzer for Java. 

2. A pushdown flow analysis for precisely co- analyzing data-, control- and 
exception-flow. 

3. An empirical evaluation demonstrating two orders of magnitude of preci- 
sion improvement over the current best analysis for exception-flow within 
reasonable analysis time. 



2 The setting: An object-oriented bytecode 



In this section, we define an object-oriented bytecode language closely modeled 
on the Dalvik virtual machine to which Java applications for Android are com- 
piled. Subsequent sections develop our analysis for this language. 

2.1 Syntax 

The syntax of the bytecode language is given in Figure [TJ Statements encode 
individual actions for the machine; atomic expressions encode atomically com- 
putable values; and complex expressions encode expressions with possible non- 
termination or side effects. There are four kinds of names: Reg for registers, 
ClassName for class names FieldName for field names and MethodName for method 
names. There are two special register names: ret, which holds the return value of 
the last function called, and exn, which holds the most recently thrown exception. 

The syntax is largely usual for an Java-like bytecode, but let us explain the 
statements related to exceptions in method- def in more detail: 

— (throws class-name . . . ) indicates that a method makes a throws declaration. 

— (push-handler class-name label) pushes a handler frame on the stack. The 
frame will catch exceptions of type class and divert execution to label. 

— (pop-handler) pops the top-most handler frame off the stack. 

With respect to a given program, we assume a syntactic metafunction S : 
Label — > Stmt*, which maps a label to the sequence of statements that start with 
that label. 

2.2 Concrete semantics 

Interpretation of bytecode programs is defined in terms of a CESK-style machine 
model. States of this machine consist of a series of statements, a frame pointer, 
a heap, and a stack. The evaluation of a program is defined as the set of ma- 
chine configurations reachable by machine transitions from the initial program. 
Formally, the evaluation function, £ : Stmt* — > V (Conf), is defined as: 

£(s) ={c:l(s) ^* c}. 

This function injects, using I : Stmt* — > Conf, an initial program sequence into 
an initial machine configuration. From this initial configuration, evaluation is 
defined by the set of configurations reached by the reflexive, transitive closure of 
the machine transition relation, (=>) C Conf x Conf. The next section describes 
the details of machine configurations; the subsequent section defines the machine 
transition relation, (=>). 



program ::= class- def . . . 

class-def ::= (attribute . . . class class-name extends class-name 

(field-def . . .) (method-def . . . )) 
field- def ::= (field attribute . . . field-name type) 
method-def 6 MethodDef ::= (method attribute . . . method-name (type . . .) type 

(throws class-name . . . ) (limit n) s . . .) 
s 6 Stmt ::= (label label) | (nop) | (line int) | (goto label) 

(if as (goto label)) | (assign name [as \ ce]) | (return as) 
(field-put as field-name as v ) | (field-get %name as field-name) 
(push-handler class-name label) | (pop-handler) | (throw as) 
as £ AExp ::= this | true | false | null | void | name \ int 

(atomic-op as ... as) \ instance-of (as, class-name) 
ce ::= (new class-name) \ (invoke-kind (as. . . as) (type . . . type n )) 
invoke-kind ::= invoke-static | invoke-direct | invoke-virtual | invoke-interafce | invoke-super 
type ::= class-name \ int | byte | char | boolean 
attribute ::= public | private | protected | final | abstract. 

Fig. 1: An object-oriented bytecode adapted from the Android specification [2T]. 

2.3 Concrete configuration-space 

Figure presents the machine's concrete configuration-space. The machine has 
an explicit stack, which under structural abstraction will become the stack com- 
ponent of a pushdown system. The stack contains not only call frames, but 
also mini-frames for exception handlers. The FramePointer is the environmental 
component of the machine: by pairing the frame pointer with a register name, 
it forms the address of its value in the store. 

The initial configuration consists of the program, the initial frame pointer, 
an empty heap, and an empty stack: 

c = T{s) = (sjp 0l [},()). 

2.4 Concrete transition relation 

In this section, we describe the essential cases of the relation, which deal 
with objects and exceptions. The remaining cases are in Appendix IA.21 

The machine relies on helper functions for evaluating atomic expressions, 
looking up field values, and allocating memory: 

— A : AExp x FramePointer x Store Vol evaluates atomic expressions: 
A(name,fp, a) = o~(fp, name) [variable look-up]. 



a G Addr = RegAddr + FieldAddr 
ra G RegAddr — FramePointer x Reg 
fa G FieldAddr = ObjectPointer x FieldName 



c G Conf 
a G Store 



— Stmt* x FramePointer x Store x Kont 



= Addr -¥ Vol 



[configurations] 

[stores] 

[addresses] 



k G Kont — Frame 



[continuations] 



4> G Frame — CallFrame + HandlerFrame 
X G CallFrame ::= fun(/p, s) 



r\ G HandlerFrame ::= handle(class-name, label) 
d G Vol — ObjectValue + String + Z 
ov G ObjectValue — ObjectPointer x ClassName 



[values] 



fp G FramePointer is an infinite set of frame pointers 



op G ObjectPointer is an infinite set of object pointers 



[frame pointers] 
[object pointers]. 



Fig. 2: The concrete configuration-space. 



— Ajr : AExp x FramePointer x Store x FieldName — Val looks up fields: 



Allocation FramePointer and ObjectPointer determine addresses for RegAddr 
and FieldAddr respectively. We need to specify how to allocate these pointers: 

- allocFP : Conf — > FramePointer chooses a fresh frame pointer for newly 
invoked method. 

- allocOP : Conf — > ObjectPointer, allocates a fresh object pointer in the 
instantiation site. 

For the sake of defining a concrete semantics, these could allocate increasingly 
larger natural numbers. Under abstraction, these parameters provide the knob to 
tune the polyvariance, context-sensitivity and object-sensitivity of the resulting 
analysis. 

New object creation Creating an object allocates a new object pointer, cre- 
ates a fresh address for the register and initializes the fields: 



Ajr (as, fp , a, field-name) = a {op, field- name) 
where {op, class-name) = A{se,fp,a). 



[field look-up] 



c 



, ^ 

([(assign name (new class-name)) : s],fp,a, k) => {s,fp,a ,k), where 



op = allocOP{c) 

a' = a[{fp, name) i-> {op, class-name)} 
a" = init Object {a , class-name). 



The helper function, initObject : Store x ClassName — Store, initializes the field 
addresses in the provided store. 

Instance field reference/update Referencing a field gets the object pointer 
and then grabs the field value as an offset: 

([(field-get name as field-name) : s],fp,a, k) (s,fp,a',K), where 

a' — a[(fp,name) i-> Aj^(se ,fp, a, field- name)}. 

Updating a field grabs the object, extracts the object pointer and updates the 
associated field in the store: 

([(field-put ae field-name ee v ) : s],/p,er, k) => (s,fp,a ,k), where 
a' = a[(op , field-name) M> A(se v ,fp,a)} 
(op, class-name) = A(ee ,fp,a). 

Method invocation Method invocation involves all four components of the 
machine. Since the language supports inheritance, method resolution requires a 
traversal of the class hierarchy. This traversal is not of interest, so we focus on the 
helper function that performs method application: apply M ethod . The function 
applyMethod takes a method definition, arguments, a frame pointer, a store and 
a continuation and produces the next configuration: 

applyMethod : MethodDef x AExp* x FramePointer x Store x Kont — Conf. 

It looks up the values of the arguments, binds them to the formal parameters of 
the method, creates a new frame pointer and a new continuation: 

applyMethod (m, se,fp, a, n) = (s,fp', a', (fp, s) : n), where fp' is fresh, 
o-' = <r[{fp', namei) H> A(xi, fp, a)}. 

Finally, the transition looks up the method m and then passes it to applyMethod: 

(l(invoke-kind (aso . . . ee n )(type . . . type n ))\ : s,fp, a, k) apply M ethod (m, se,fp, 

Procedure return Returning a value restores the caller's context and puts the 
return value in the dedicated return register, ret. 

([(return as) : sj,fp,a,fun(fp',s') : n) (s' , fp' , a' , k) , where 

a' = a[{fp', ret) ^ A{ee,fp, a)}. 

If a HandlerFrame is on top of the stack, the transition will pop it without 
changing any other part of the state: 

([(return as)] : s,fp, a, handle(class-name label) : k) => ([(return as)] : s,fp,a,K). 



Pushing and popping exception handlers Pushing and popping exception 
handlers is straightforward: 

([(push-handler class-name label)} : s,fp,a,n) => (s,jp, a, handle (class-name label) : k), 
([(pop-handler)] : s,fp, a, handle (class- name label) : n) =>• (s, fp, a, k). 

Throwing and catching exceptions The throw statement peels away layers 
of the stack until it finds a matching exception handler: 

([(throw as)] : s,fp,a,n) => handle (ee,s,fp, a, k), 

where the function handle : AExp x Stmt* x FramePointer x Store x Kont — Conf . 
does the peeling. If a matching handler is found, that is, class-name is a subclass 
of class-name' , where (op, class-name) = A(ee,fp, a) and class-name' is from the 
top HandlerFrame, the execution flow jumps to code block of the handler: 

handle(ee, s,fp, a, handle(class-name' label) : n) = 
(S (label), fp,o~[(fp,exn) H> (op, class- name)], n'). 

The last thrown exception object value will be put in the dedicated exception 
register exn. 

If the exception type does not match or it's a call frame, then handle transits 
to a configuration with the control state unchanged but with the top frame 
popped: 

handle(ee, s,fp,a,h.&ndle(class-name' label) : k') = ([(throw as)] : s,fp,a,n') 
handle (se,s,fp, a, fun(j p ,s') : k') — ([(throw as)] : s, fp, a, k'). 

The abstraction of these "multi-pop" transition relations will require modifica- 
tion of the algorithm used for control-state reachability (Section 16. ip . 

3 Pushdown abstract semantics 

With the concrete semantics in place, it is time to abstract them into an anal- 
ysis. To achieve a pushdown analysis, we abstract less than we normally would. 
Specifically, we conduct a structural abstraction of the concrete state-space and 
leave the stack height unbounded rather that thread frames through the heap. 

3.1 Abstract semantics 

Abstract interpretation is defined in terms of a structural abstraction of the 
machine model of Section [2j The evaluation of a program is defined as the set 
of abstract machine configurations reachable by an abstraction of the machine 



transitions relation. Largely, abstract evaluation, £ : Stmt* — > V(Conf), mimics 
its concrete counterpart: 

S(s) = {c: J(s) c}. 

Abstract evaluation is denned by the set of configurations reached by the reflex- 
ive, transitive closure of the (~~») relation, which abstracts the (=>) relation. 



3.2 Abstract configuration-space 

Figure|3]details the abstract configuration-space. We assume the natural element- 
wise, point-wise and member- wise lifting of a partial order across this state-space. 



c G Conf 
a G Store 
a G Addr 
fa G RegAddr 
fa G FieldAddr 
k G Kont 
4> G Frame 
X G CallFrame 
rj G HandlerFrame 
deVal 

dv G ObjectValue 
fp G FramePointer 
op G ObjectPointer 



— Stmt* x FramePointer x Store x Kont 
= Addr Val 

= RegAddr + FieldAddr 
= FramePointer x Reg 
= ObjectPointer x FieldName 
= Frame 

— CallFrame + HandlerFrame 
::= fun(/p, s) 

::= handle(class-name, label) 
= V ( ObjectValue + String + Z^j 

— ObjectPointer x ClassName 
is a finite set of frame pointers 
is a finite set of object pointers 



[configurations] 

[stores] 

[addresses] 



[continuations] 
[stack frames] 



[abstract values] 

[frame pointers] 
[object pointers]. 



Fig. 3: The abstract configuration-space. 



To synthesize the abstract state-space, we force frame pointers and object 
pointers (and thus addresses) to be a finite set, but crucially, we leave the stack 
untouched. When we compact the set of addresses into a finite set, the machine 
may run out of addresses to allocate, and when it does, the pigeon-hole principle 
will force multiple abstract values to reside at the same address. As a result, we 
have no choice but to force the range of the Store to become a power set in the 
abstract configuration-space. 



3.3 Abstract transition relation 



The abstract transition relation has components analogous to those from the 
concrete semantics: 

— X : Stmt* — > Conf injects an sequence of instructions into a configuration: 

c = X(s) = (s,fp Q , [], ()). 

— A : AExp x FramePointer x Store — 1 Val evaluates atomic expressions: 

A(name , fp , a) = o~(fp, name) [variable look-up]. 

— Xf : AExp x FramePointer x Store x FieldName — 1 Val looks up fields: 

Aj^(as, fp, a, field- name) = | | <j (op, field-name) [field look-up] 

where (op, class-name) € A(se,fp,a). 

Because there are an infinite number of abstract configurations, a naive imple- 
mentation of the £ function may not terminate. 

Appendix IA. 1 41 discusses abstractions of allocFP and allocOP that allow the 
selection of different analyses such as fc-CFA or polymorphic splitting. 

The rules for the abstract transition relation (—->•) C Conf x Conf largely 
mimic the structure of the concrete relation (=>). The biggest difference is that 
the structural abstraction forces the abstract transition to become nondctcrmin- 
istic. We detail these rules below and illustrate the differences from its concrete 
counterpart. Again, we only cover rules involving objects and exceptions. Ap- 
pendix contains the remaining rules. 

New object creation Creating an object allocates a potentially non-fresh 
object pointer and joins the newly initialized object into that store location: 

c 

' * : * 

([(assign name (new class-name)) : s],/p,<t,k) => (s,fp,a r 

op = allocOP (c) 
a' = <r U [(fp, name) t— > (op , class-name)] 
a = initObject(6~' , class-name), 

where the helper initObject : Store x ClassName — 1 Store initializes fields. 

Instance field reference/update Referencing a field uses A? to evaluate the 
field values and join the store for destination register: 

([(field-get name ee field-name) : s\,fp,a,k) ^> (s,fp,a',k), where 

a' = a U [(fp, name) M> Aj^(sc , fp, a , field- name)]. 



Updating a field first finds the abstract object values from the store, extracts its 
object pointer from each of all the possible values, then pairs this object pointer 
with the field name to get the field addresses, and finally joins the extensions to 
the store: 

([(field-put sc field-name se v ) : s},fp,a,k) ~^ {s,fp,a',k), where 
a' = a U [{'op , field-name) i-> A{se v ,fp, a)] 
{op, class-name) G A{se ,fp,a). 

Method invocation Like the concrete semantics, method invocation also in- 
volves all four components of the machine. The main difference is that, for non- 
static methods invocation, there can be a set of possible objects that are invoked, 
rather than only one as in its concrete counterpart. This also means that there 
could be multiple method definitions resolved for each object. For each such 
method m: 

c 

f ^ \ 

{{{invoke- kind (aso . . . as„) {type . . . type n ))\ : s,fp, a, k) apply Method{m, se,fp, a, 
where, 

{s,fp ,a',{fp,s) : k), where 
aUo~c~FP{c) 

a U [{fp, namei) H> A{adi,fp, a)]. 

Procedure return Procedure return pops off the top-most fun frame: 

([(return as) : s\,fp, a, fun{fp , s') : k) (s' ,fp , u , k) , where 

a = <rU [{fp , ret) H- A{ae,fp, a)]. 

If the top frame is a handle frame, the abstract interpreter pops until the top- 
most frame is a fun frame: 

([(return as)] : s,fp,a,handle{class-name label) : k) ([(return as)] : s,fp,a,k). 

Pushing and popping handlers Handlers push and pop as expected: 
([(push-handler class-name label)} : s,fp,a,k) ~-+ {s,fp,a,handle{class-name label) 



applyMethod (m, as, fp, a, k) — 

fp = 
a' = 



([(pop-handler)] : s,fp,a,h&ndle{class-name label) : k) {s,fp,a,k). 



Throwing and catching exceptions The throw statement peels away layers 
of the stack until it finds a matching exception handler: 

([(throw as)] : s,fp,<j,k) handle(a3,s,fp,a,k), 

where the function handle : AExp x Stmt* x FramePointer x Store x Kont — Con] 
behaves like its concrete counterpart when the top-most frame is a compatible 
handler: 

handle(x, s,fp, a, h.a.ndle(class-name' label) : k') 

= (S(ldbel),fp, a U [(Jp, exn) i— > (op, class- name)], k'). 

Otherwise, it pops a frame: 

handle(se, s,/p, a, handle(_, _) : k 1 ) = ([(throw as)] : s,fp,a,k') 
handle(ee, s,fp, a, fun(_, _) : k') = ([(throw as)] : s,fp,a,k'). 

4 The shift: Prom abstract CESK to pushdown systems 

In the previous section, we constructed an infinite-state abstract interpretation of 
the CESK-likc machine to analyze exception flows for object-oriented languages. 
The infinite-state nature of the abstraction makes it difficult to answer static 
analysis questions: How do you compute the reachable states if there are an 
infinite number of them? Fortunately, a shift in perspective reveals that the 
machine is in fact a pushdown system for which control-state reachability is 
decidable. 

If we take Stmt* x FramePointer x Store as the finite set of control states 
and Kont is the set of stacks, then it is immediately apparent that the abstract 
semantics that we have created is a pushdown system. This is the object-oriented 
analog of Earl et aZ.'s observation for the A-calculus [10]. This shift permits 
the use of a control-state reachability algorithm in place of exhaustive search 
of the configuration-space. [Appendix Figure [S] defines the program-to-RPDS 
conversion function TVS : Stmt* -)■ MPBS in detail] 

4.1 Mini-evaluation 

In Table EJ when we compare the resulting analysis to Bravenboer and Smarag- 
dakis's finite-state analysis of exceptions [6], we find a solid improvement in 
precision, but a substantial slowdown in time. This is not surprising: computing 
the reachable states in a pushdown system is cubic in the number of states. In the 
next section, we improve the running time by porting another powerful technique 
from abstract interpretation of the A-calculus: abstract garbage collection. 



5 Abstract garbage collection for objects 



Abstract garbage collection is known to yield order-of-magnitude improvements 
in precision, even as it drops run-times by cutting away false positives. Adapting 
abstract garbage collection seemed like the right tool to fix the performance 
problem of the previous section. We directly benefit from that line of work on 
the A-calculus, which developed a class of introspective pushdown machines as 
a means of combining pushdown analysis with abstract garbage collection |10|. 
Introspective pushdown systems are pushdown systems that have read access 
to the entire stack during a transition. Since the root set for garbage collection 
depends on the entire stack, we need an introspective pushdown systems to 
use abstract garbage collection. [Appendix IA. 121 formalizes the injection into an 
introspective pushdown system.] 

It's natural to think that the combined technique will benefit exception- flow 
analysis for object-oriented languages. However, as we shall demonstrate, we 
must conduct a careful and subtle redesign of the abstract garbage collection 
machinery for object-oriented languages to gain the promised analysis precision 
and performance. 

In the following, we present how to adapt abstract garbage collection to 
work under abstract semantics defined in Section [3] Abstract garbage collection 
discards unreachable elements from the store. It modifies the transition relation 
to conduct a "stop-and-copy" garbage collection before each transition. To do 
so, we define a garbage collection function G : Conf — » Conf on configurations: 



G(s,fp, a, k) = (s, fp, a | Reachable (c),k), 

where the pipe operation f\S yields the function /, but with inputs not in the 
set S mapped to bottom — the empty set. The reachability function Reachable : 
Conf — > V^Addr) first computes the root set, and then the transitive closure of 
an address-to-address adjacency relation: 



Reachable(s , fp , a , k) — < d : do £ Root{c) and do — > 
I * 

where the function Root : Conf —> V(Addr) finds the root addresses: 

Root(s, fp, a, k) — {{fp,r) : (fp,r) £ dom(a)} U StackRoot(k), 

The StackRoot : Kont — > V(Addr) function finds roots down the stack. 
However, only CallFrame has the component to construct addresses, so we define 
a helper function T : Kont — > CallFrame to extract only CallFrame out from 
the stack and skip over all the handle frames. Now StackRoot is defined as 

StackRoot(k) = {(fPi,r) : (fPi,r) £ dom{a) and fp i G J-'(k)}, 



and the relation: 

(->) C Addr x Store x Addr 
connects adjacent addresses: 

a —o a' iff there exists (op, class-name) S cr(a) 

a" 

such that a' S {(op, field- name) : (op, field- name) G G?om(cr)}. 

Example runs with abstract garbage collection Table Q] presents the ex- 
ample results of running pushdown analysis with and without abstract garbage 
collection, as described. It shows that abstract garbage collection further im- 
proves the precision, but the effect is not as large as we had predicted, especially 
with respect of analysis time, where on functional programs, abstract garbage 
collection can bring order-of-magnitude reductions in both imprecision and time. 
The next section teases out the problem and develops a solution: combining ab- 
stract garbage collection with live range analysis. 



Benchmark 


Opts 


Nodes 


Edges 


VarPointsTo 


E-C links 


Time(sec) 


lusearch 


pdcfa 
+gc 


91574 
26365 


105154 
30426 


(1423, 3) 
(1086, 2) 


76 
63 


5520 
4800 



Table 1: Example analysis result by (introspective) pushdown system. 
VarPointsTo measures how many objects can a variable possibly points to; it is 
presented as a tuple (a, &), where a is the total entries, b is the average objects 
being invoked on; E-C links is the number of pairs of an instruction that can 
throw exceptions and a handler that can possibly handle the exception. These 
metrics are used by Fu, et al. [TT], and Bravenboer and Smaragdakis [TJ. 



5.1 Live register analysis (LRA) for AGC 

Even though pushdown analysis with/ without garbage collection promises to 
increase analysis precision, the analysis time is not satisfying, as shown in Ta- 
bled] The benchmark lusearch with abstract garbage collection still takes more 
than an hour. By manual inspection on some other benchmarks we have run, 
we find that in the register-based byte code, there are cases that the same reg- 
ister is reassigned multiple times at different sites within a method. Therefore, 
abstract object values are unnecessarily "merged" together. The result is that 
unnecessary state space is explored and analysis time is prolonged. 

The direct adaptation of AGC to an object-oriented setting in Section [S] can- 
not collect these registers between uses. For object-oriented programs, we want to 
collect registers that are reachable, but not without an intervening assignment. 

As it turns out, the fix for this problem is a classic data-flow analysis: live- 
register analysis (LRA) . LRA can compute the set of registers that are alive at 



each statement within a method. The garbage collector can then more precisely 
collect each frame. 

Since LRA is well-defined in the literature jT], we skip the formalization 
here, but the Root is now modified to collect only living registers of the current 
statement Lives {so}: 

Root(s , fp , a , k) — {(fp,r') : (fp,r') € dom(a) and r £ Lives {so}} L) StackRoot(k). 

Section [7] presents the complete results running on the suite of the bench- 
marks based on the joint analysis (denoted as +gc+lra in Table [5]). 

6 Extending pushdown reachability to exceptions 

With the formalism in previous sections, it is not hard to translate the ab- 
stract semantics into working code. We use the Dyck State Graph synthesis 
algorithm — a purely functional version of the Summarization algorithm — for 
computing reachable pushdown control states [10] . 

6.1 Synthesizing a Dyck State Graph with exceptional flow 

The Dyck State Graph (DSG) of a pushdown system is the subset of a push- 
down system reachable over legal paths. (A path is legal if it never tries to pop 
a when a frame other than a is on top of the stack.) To synthesize a Dyck State 
Graph (DSG) from an (introspective) a pushdown system, Earl et al. present an 
efficient, functional modification of the pushdown summarization algorithm [10 . 
The algorithm iteratively constructs the reachable portion of the pushdown tran- 
sition relation by inserting e-summary edges whenever it finds empty-stack (e.g., 
push a, push b, pop b, pop a) paths between control states. 

For pushdown analysis without exception handling, only two kinds of transi- 
tions can cause a change to the set of e-predecessors: an intraprocedural empty- 
stack transition and a frame-popping procedure return. With the addition of 
handle frames to the stack, there are several new cases to consider for popping 
frames (and hence adding e-edges). 

The following subsections highlight how to handle exceptional flow during 
DSG synthesis, particularly as it relates to maintaining e-summary edges. The 
figures in these section use a graphical scheme for describing the cases for e-edge 
insertion. Existing edges are solid lines, while the e-summary edges to be added 
are dotted lines. 

Intraprocedural push/pop of handle frames The simplest case is entering 
a try block (a push-handler) and leaving a try block (a pop-handler) entirely 
intraprocedurally — without throwing an exception. Figure 2] shows such a case: 
if there is a handler push followed by a handler pop, the synthesized (dotted) 
edge must be added. 



Locally caught exceptions Figure [5] presents a case where a local handler 
catches an exception, popping it off the stack and continuing. 



Exception propagation along the stack Figure [5] illustrates a case where an 
exception is not handled locally, and must pop off a call frame to reach the next 
handler on the stack. In this case, a popping self-edge from control state q' to q' 
lets the control state q' see frames beneath the top. Using popping self-edges, a 
single state can pop off as many frames as necessary to reach the handle — one 
at a time. 



Control transfers mixed in try /catch Figure [7] illustrates the situation 
where a procedure tries to return while a handle frame is on the top of the 
stack. It uses popping self-edges as well to find the top-most call frame. 



Uncaught exceptions The case in Figure |5] shows popping all frames back to 
the bottom of the stack — indicating an uncaught exception. 
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7 Evaluation 



We evaluated our pushdown exception flow analysis on standard Java bench- 
marks from the DaCapo suite [3] that we were able to port to Android; we 
have also used some native Android applications. We ran these benchmarks on 
OS X 10.8.2 with a 64GB DDR3 memory, 2 Six-Core Intel Xeon X5675 CPUs, 
3.07GHz machine. Table [5] lists the results for all applications. To compare, we 
adopt metrics (and implementations) used by previous work [1117] for object- 
oriented programs: 

— VarPointsTo: Given a variable, to how many types may it point? Smaller 
sets indicate higher data-flow precision. 

— ThrowPointsTo: At a throw, how many types of exceptions could be thrown? 
Smaller sets indicate higher data-flow precision. 

— Exception- Catch-Link (E-C Link): A pair of instructions in which second 
catches the first. Fewer E-C links indicate higher exception-flow precision. 

The analysis result on running on Android applications of different size have 
already demonstrated the promise of our analytic techniques, with the average 
one to three on VarPointsTo and ThrowPointsTo, and small number of E-C links. 

The evaluation conducted on standard Java benchmarks helps us compare 
results between our techniques and prior work. We use the same version of 
benchmark suite, the DaCapo benchmark programs, v. 2006-10. MR2, which is 
used in [7]. However, only antlr, lucene, and pmd run on Dalvik bytecode, 
due to the Android SDK having class/interface naming clashes with the ones 
that are originally defined in Java SDK. 

We contacted the authors for access to the original tool Doop [7] to run the 
above benchmarks and recompute the relative metrics. Specifically, we ran Doop 
Revision 958, on JRE 1.5 and Xubuntu 12.10 inside VirtualBox 4.2.2. The met- 
rics we compute are VarPointsTo, E-C Links and analysis run time, with the op- 
tion of context-sensitivity 1-Call+H and object-sensitivity l-0bj+H respectively. 
These options are the closest to the allocation strategy in our analysis: 1-call- 
site sensitivity for calls, and 1-object-sensitivity for object allocation. In order to 
eliminate differences between the Dalvik and Java byte code, the VarPointsTo 
metric computes how many types can be invoked on at each call site. 

The comparison result is shown in the first three rows in Table [5] — the Da- 
Capo benchmarks. We could not get Doop to operate properly on Android pro- 
grams. 

As we can see that the pushdown exception-flow analysis produces almost 
two orders of magnitude improvement to the precision of points-to information 
and E-C Links for all three benchmarks over Doop. We have reported running 
times for completeness, but these numbers can't be compared as directly as 
precision. Doop used a high-performance Datalog engine to solve flow constraints; 
our implementation in Scala is asymptotically efficient, but it is not optimized; 
it incurs a significant constant-factor overhead. 

The effect of analysis time varies from different benchmarks. But take into 
consideration of the difference of running environment, Doop demonstrated less 



analysis than our analysis does. However, the co-analysis of pushdown system 
and augmented abstract garbage collection has demonstrated the best preci- 
sion/performance trade-offs. 

In Table[2] adding garbage collection and live-range analysis restriction (+gc+lra) 
improves analysis time more significantly for Android application than Java ap- 
plications. The reason is that Android applications are more sensitive to the 
LRA due to Android's multi-entry points structure. However, the results on the 
DaCapo benchmarks clearly indicate improvements over Doop in precision. 
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Table 2: Benchmark results: VarPointsTo and Throws is presented as tuples 
(a, &), where a is the total entries, b is the average types being invoked on in 
VarPointsTo case, and average exception objects thrown in Throws case. All 
times are in seconds, oo denotes the analysis did not finish within 6000 seconds. 



8 Related Work 



Precise and scalable context-sensitive points-to analysis has been an open prob- 
lem for decades. Progress in general has been gradual, with results like object- 
sensitivity [18119] intermittently providing a leap for most programs. Most re- 
sults target improvements for individual classes of programs. The techniques we 
present here broadly target at all programs, and it is orthogonal to and compat- 
ible with results like object-sensitivity. 

Much work in pointer analysis exploits methods to improve performance by 
strategically reducing precision. Lattener et al. show that an analysis with a 
context-sensitive heap abstraction can be efficient by sacrificing precision under 
unification constraints [T3] . 

In full-context-sensitive pointer analysis, the literature has sought context 
abstractions that provide precise pointer information while not sacrificing per- 
formance. Milanova found that an object-sensitive analysis |19j is an effective 
context abstraction for object-oriented programs. This is confirmed by the ex- 
tensive evaluation by Lhotak |15) . He and other researchers have also argued for 
using context-sensitive heap abstraction to improve precision [2D], 

BDDs have been used to compactly represent the large amount of redundant 
data in context-sensitive pointer analysis efficiently [3 31 33 . Specifically, Xu 
and Routev's work [33J reduces the redundancy by choosing the right context 
abstractions. Such advancements could be applied to our pushdown framework, 
as they are orthogonal to its central thesis. 

Finite-state analysis of exceptions The main contribution of the paper is 
significantly improved analysis precision via pushdown systems that analyze the 
exceptional control-flow of object-oriented programs. 

The bulk of the previous literature has focused on finite-state abstractions for 
Java programs, i.e., fc-CFA and its variants. Specifically, for the work that han- 
dles exception flows, the analysis is based on context-insensitivity or a limited 
form of context-sensitivity, which makes them unable to differentiate the con- 
texts of where an exception is thrown and what handlers precisely can handle the 
exception. Robillard et al. |25] presents a truly interprocedural exception-flow 
analysis, but exceptions propagate via imprecise control flows by using class hier- 
archy analysis. The same is true for Jo et al. |32j . and its extension for concurrent 
Java programs [53]. Leroy and Pessaux Q3] use type systems to model excep- 
tions, specifically to analyze uncaught exceptions. Limited context-sensitivity 
is employed for the purpose of more precise results on polymorphic functions. 
Fu et al. [Ill proposed the E-C link metric to evaluate exception-flow precision. 
They also documented the exception handler matching problem caused by an 
imprecise control flow graph. They approach the problem by employing points-to 
information to refine control-flow reachability. Bravendoer and Smaragdakis [7] 
propose to join points-to analysis and exception flow analysis to improve pre- 
cision and analysis run time in their Doop framework, based on the optimized 
analysis engine using Datalog [8] . They have conducted extensive comparison of 



different options for polyvariance. It is the most precise and efficient exception- 
flow analysis compared to other work, with respect of points-to and E-C links. 
We conduct our comparison with respect to their work, and found the pushdown 
approach can yield significant improvement in precision, but the run-time is not 
comparable to their work, partly due to their mature optimization methodology 
for Datalog. 

Pushdown analysis for the X-calculus Vardoulakis and Shivers's CFA2 30 
is the precursor to the pushdown control-flow analysis [9, . CFA2 is a table-driven 
summarization algorithm that exploits the balanced nature of calls and returns 
to improve return-flow precision in a control-flow analysis. While CFA2 uses a 
concept called "summarization," it is a summarization of execution paths of the 
analysis, roughly equivalent to Dyck state graphs. 

In terms of recovering precision, pushdown control-flow analysis [9] is the 
dual to abstract garbage collection: it focuses on the global interactions of con- 
figurations via transitions to precisely match push-pop/call-return, thereby elim- 
inating all return-flow merging. However, pushdown control-flow analysis does 
nothing to improve argument merging. 

This work directly draws on our previous work on pushdown analysis for 
higher-order programs [9 and introspective pushdown system (IPDS) for higher- 
order programs [10]. IPDS has tackled the challenge of incorporating abstract 
garbage collection |17j into pushdown system and improving the summarization 
algorithm for efficiency. That work shows significant improvements in precision 
and analysis time for the A-calculus. We extend the introspective work in two 
dimensions: (1) we generalize the framework (including abstract garbage collec- 
tion) to an object-oriented language, and (2) we adapt the Dyck state graph 
synthesis algorithm to handle the new stack change behavior introduced by ex- 
ceptions. 

CFL- and pushdown-reachability techniques In previous work, Earl et 
al. |10j develop a pushdown reachability algorithm suitable for the pushdown sys- 
tems that we generate. It essentially draws on CFL- and pushdown-reachability 
analysis |5ll2l23l2~i] . For instance, e-closure graphs, or equivalent variants thereof, 
appear in many context-free-language and pushdown reachability algorithms. 
Dyck state graph synthesis is an attractive perspective on pushdown reachabil- 
ity because it is purely functional, and it allows targeted modifications to the 
algorithm. 

CFL-reachability techniques have also been used to compute classical finite- 
state abstraction CFAs [16] and type-based polymorphic control-flow analy- 
sis [22] . These analyses should not be confused with pushdown control-flow anal- 
ysis, which is computing a fundamentally different kind of CFA. 

Pushdown exception-flow analysis There is little work on pushdown anal- 
ysis for object-oriented langages as a whole. Sridharan and Bodik proposed 
demand-driven analysis for Java that matches reads with writes to object fields 



selectively, by using refinement [28] . They employ a refinement-based CFL- 
reachability technique that refines calls and returns to valid matching pairs, 
but approximates for recursive calls. They do not consider specific applications 
of CFL-reachability to exception- flow. 

9 Conclusion 

Poor analysis of exceptions pollutes the interprocedural control-flow analysis of 
a program. In order to model exceptional control-flow precisely, we abandoned 
traditional finite-state approaches (e.g. k-CFA and its variants). In its place, we 
generalized pushdown control-flow analysis from the A-calculus [TU] to object- 
oriented programs, and made it capable of handling exceptions in the process. 
Pushdown control-flow analysis models the program stack (precisely) with the 
pushdown stack. Computing the reachable control states of the pushdown system 
(its Dyck state graph) yields combined data- and control-flow analysis of a pro- 
gram. Comparing this approach to the state-of-the-art [6], shows substantially 
improved precision. To improve time, we adapted abstract garbage collection 
to object-oriented program analysis. The end result is an improvement in data- 
and control-flow precision of roughly two orders of magnitude when soundly 
reasoning in the presence of exceptions. 
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A Optional Appendix 



This appendix is not required reading. We provide it for background, refreshment 
and deeper discussion. 



A.l Notational conventions 

We strive to stick to conventional notation and names wherever possible. For 
less common concepts, we define them here. 

The functional update operation f[x i— > y] extends a function with a new binding: 

f[x h-» y](x) = y 
f[x^y]{x') = f{x') iix^x'. 

For objects like stores, we lift the least upper bound U point-wise: 

(aUa')(a) = a{a) U a' {a). 

A. 2 Additional concrete transition relations 

Stepping over nops and labels : The simplest instruction nop does not 
change any component in the configuration state: 

c c 

, A ^ , * ^ 

([nop] : sjp,a,n) (s,fy>,a,K) . 
label and line statement shares the same transition form. H 



Unconditional jumps : This kind of statement forces program to jump to the 
target statement sequence: 

c c' 

([(goto label)} : s,fp,a,n) => (S {label), fp, a, k) . 

where the function S : Label — > Stmt* maps a label to the statement sequence 
starting with that label. 

3 The line statement is mainly for instrumenting context information to the statements 
that are actually interpreted. 



Conditionals The if-goto is not much more complicated than a nop or goto, 
but it needs to evaluate the conditional expression: 



(if se (goto label))} : s,fp,<j,n) 



(s,fp,a,n) A(ee,fp,a) ^ false 

(S(label),fp,a,n) otherwise 



Atomic assignments Atomic assignment statements assign the value of an 
atomic expression to a variable(register). This involves evaluating the expression, 
calculating the frame address to modify and then updating the store. 



(assgin %name e) =>■ (s,fp, a ,k) where 

a' = <r[(Jp, %name) i—¥ A(e,fp, a)} 

Note that a large set of instruction statements are transformed into assign form. 
For example, (move-result %name) is transformed to (assign %name ret) form. 



A. 3 Other abstract transition relations 



— Stepping over nops and labels: Abstract transition relations for this kind 
of statement is almost like correspondent concrete semantics (Section 12.41) 

c c 

([nop] : s,fp,a,k) ~» (s,fp,a,k) . 

The same right hand side of transition relation for label and line statement. 

— Unconditional jumps: Like nop statement and label statements, no much 
difference but with abstract components replaced. 



([(goto label)] : s,fp,a,k) ~* (S (label) , fp , a , k) . 

Conditionals The if-goto is not much more complicated than a nop or goto, 
but it needs to evaluate the conditional expression: 



(if as (goto label))] : s,fp,a,k) 



(s,fp,a,k) false £ A(ee,fp,a) 

(S (label) , fp , <7, k) otherwise 



— Atomic assignments: The main change to the abstract transition relation 
for atomic assignments resides in the operation to the store component: 



(assgin %name e) =>■ (s,fp,a , k) where 

a' = a U {(fp, tname) H> A(e,fp, a)] 

A. 4 Syntactic sugar for pushdown systems 

When a triple (x,£,x') is an edge in a labeled graph: 

x > > x ^ (x, |. 
Similarly, when a pair (x,x') is a graph edge: 

x >— > x' = (x, x'). 
We use both string and vector notation for sequences: 

aia 2 ■ ■ -a n = (ai,a 2 , . . . ,a„) = a. 

A. 5 Stack actions, stack change and stack manipulation 

Stacks are sequences over a stack alphabet T. To reason about stack manip- 
ulation concisely, we first turn stack alphabets into "stack-action" sets; each 
character represents a change to the stack: push, pop or no change. 

For each character 7 in a stack alphabet r, the stack-action set r± contains a 
push character 7+; a pop character 7_; and a no-stack-change indicator, e: 

g G r± ::= e [stack unchanged] 

7 + for each 7 G r [pushed 7] 

7_ for each 7 G r [popped 7] . 

In this paper, the symbol g represents some stack action. 

When we develop introspective pushdown systems, we are going to need for- 
malisms for easily manipulating stack-action strings and stacks. Given a string 
of stack actions, we can compact it into a minimal string describing net stack 
change. We do so through the operator [-J : r± — > r±, which cancels out oppos- 
ing adjacent push-pop stack actions: 

[g 7+7- 9 'J = \a g 'J [g £ 9 'J = \g g 'J , 

so that [g\ = g, if there are no cancellations to be made in the string g. 



We can convert a net string back into a stack by stripping off the push symbols 
with the stackify operator, ["•] : r± — T*: 

r7 + 7;---7i n) i = (7 (n) ,---,y,7), 

and for convenience, [g] = \[g\~\- Notice the stackify operator is defined for 
strings containing only push actions. 



A. 6 Pushdown systems 

A pushdown system is a triple M = (Q, r, 6) where: 

1. Q is a finite set of control states; 

2. r is a stack alphabet; and 

3. S C Q x r± x Q is a transition relation. 

The set Q x r* is called the configuration-space of this pushdown system. We 
use PDS to denote the class of all pushdown systems. 



For the following definitions, let M = (Q,r,5). 

— The labeled transition relation (i — > M ) Q (Q x T*) xf±x(Qx T*) deter- 
mines whether one configuration may transition to another while performing 
the given stack action: 

(17,7) h^-> (</, 7) iff q^->q' e 6 [no change] 

(Q, 7 = 7) ^ (q, 7) iff <7 >-> Q e (5 [pop] 

(9,7) ^> (<?',7 = 7) iff <7^V S <5 [push]. 

— If unlabclled, the transition relation (1 — ►) checks whether any stack action 
can enable the transition: 

c 1 — > c' iff c h-^-> c' for some stack action q. 

M M 

— For a string of stack actions g\ . . . g n : 

c i — > c„ lit c i — > ci i — >• • • • i — > c„_i i — > c„, 

M M M M M 

for some configurations Co, . . . , c n . 

— For the transitive closure: 

c i — ^ c' iff c i — >■ c' for some action string q . 

MM 5 ^ 



A. 7 Rooted pushdown systems 



A rooted pushdown system is a quadruple (Q, r, S, qo) in which (Q, r, S) is a 
pushdown system and go G Q is an initial (root) state. RPBS is the class of all 
rooted pushdown systems. 

For a rooted pushdown system M = (Q, r, 8, qo), we define the reachable-from- 
root transition relation: 

ci 9 )) c iff (qo, {)) i — c and c c'. 

M M M 

In other words, the root-reachable transition relation also makes sure that the 
root control state can actually reach the transition. 

We overload the root-reachable transition relation to operate on control states: 
q 9 > > q' iff (q, 7) 1 -) > (g', 7 ') for some stacks 7, 7 '. 

For both root-reachable relations, if we elide the stack-action label, then, as in 
the un-rooted case, the transition holds if there exists some stack action that 
enables the transition: 

q 1 — H» q' iff q 1 — q for some action q. 

A. 8 Computing reachability in pushdown systems 

A pushdown flow analysis can be construed as computing the root- reachable 
subset of control states in a rooted pushdown system, M = (Q, r, S, qo): 

{ q:qo ir q } 

Reps et. al and many others provide a straightforward "summarization" al- 
gorithm to compute this set [5112123121] . Our preliminary report also offers a 
reachability algorithm tailored to higher-order programs [5]. 



A. 9 Nondeterministic finite automata 

In this work, we will need a finite description of all possible stacks at a given 
control state within a rooted pushdown system. We will exploit the fact that 
the set of stacks at a given control point is a regular language. Specifically, 
we will extract a nondeterministic finite automaton accepting that language 
from the structure of a rooted pushdown system. A nondeterministic finite 
automaton (NFA) is a quintuple M = (Q, S, 5, qo, F): 



— Q is a finite set of control states; 

— S is an input alphabet; 

— 5 C Q x (£ U {e}) x Q is a transition relation. 

— go is a distinguished start state. 

— F C Q is a set of accepting states. 

We denote the class of all NFAs as NFA. 



A. 10 Introspective pushdown systems 

An introspective pushdown system is a quadruple M — (Q, r, 5, qo): 

1. Q is a finite set of control states; 

2. r is a stack alphabet; 

3. S C Q x r* x r± x Q is a, transition relation; and 

4. q is a distinguished root control state. 

The second component in the transition relation is a realizable stack at the given 
control-state. This realizable stack distinguishes an introspective pushdown sys- 
tem from a general pushdown system. IPDS denotes the class of all introspective 
pushdown systems. 

Determining how (or if) a control state q transitions to a control state q', requires 
knowing a path taken to the state q. Thus, we need to define reachability induc- 
tively. When M = (Q, r,S,q ), transition from the initial control state considers 
only empty stacks: 

1o | -r*+ 1 iff (Qo, 0,9, q) G 

M 

For non-root states, the paths to that state matter, since they determine the 
stacks realizable with that state: 

q 9 > > q' iff there exists g such that go 1 > > q and (q, [g],g, q') G S, 

M M 

where q {9 l^4 n) q' iff q ^ q, ^ ■ ■ ■ ^ q' . 

M M M M 

A. 11 Computing reachability in intropective pushdown system 

We cast our reachability algorithm for introspective pushdown systems as finding 
a fixed-point, in which we incrementally accrete the reachable control states into 
a "Dyck state graph." 



A Dyck state graph is a quadruple G = (S, T, E,s ), in which: 



1. S is a finite set of nodes; 

2. r is a set of frames; 

3. E C 5 x r± x 5 is a set of stack-action edges; and 

4. «o is an initial state; 

such that for any node s 6 S, it must be the case that: 

(s , (}) i—)- (s, 7) for some stack 7. 

G 

In other words, a Dyck state graph is equivalent to a rooted pushdown system in 
which there is a legal path to every control state from the initial control state0 
We use BSC- to denote the class of Dyck state graphs. (Clearly BSCS C MPOS.) 

Our goal is to compile an implicitly-defined introspective pushdown system into 
an explicited-constructed Dyck state graph. During this transformation, the per- 
state path considerations of an introspective pushdown are "baked into" the 
Dyck state graph. We can formalize this compilation process as a map, T>SQ : 
IPOS -> BSC 

Given an introspective pushdown system M — (Q, r,6,qo), its equivalent Dyck 
state graph is VSQ(M) = (S,r,E,q ), where s = qo, the set S contains reach- 
able nodes: 

S = (q:Qo ' 9 > > q for some stack-action sequence g| , 

and the set E contains reachable edges: 

E={qAq':q^q'}. 

Our goal is to find a method for computing a Dyck state graph from an intro- 
spective pushdown system. 



A. 12 Garbage collection in introspective pushdown systems 

Having augmented the abstract garbage collection with respect of objects, we are 
now ready to embed it into introspective pushdown systems, using the function 
TPVS : Stmt* IPDS as presented in Fig [TO]. 



4 We chose the term Dyck state graph because the sequences of stack actions along 
valid paths through the graph correspond to substrings in Dyck languages. A Dyck 
language is a language of balanced, "colored" parentheses. In this case, each char- 
acter in the stack alphabet is a color. 



PVS(e) = (Q,r,S,q ), where fPVS(e) = (Q, T, 5,q ) 

„ _ — jr—-. , 7^ — Q = Stmt* x FramePointer x Store 
Q = btmt* x tramePomter x itore 

^ ^ r — Frame 
I = frame 

(q, e, g') € 5 iff (q, ft) ~» (?', ft) for all ft («= A > e > ?') G 5 iff «) ^ A ) 

(q,4>-,q) e S iS (qj : k) (q',k) for all ft (<?, = «,<£-,?') £ 5 iff G(g, ^ : ft) ~> (g',ft) 

i<l,<i>+,q') G 5 iff (g,ft) - (g',<£: ft) for all ft. (q,k,j>+,q') G 5 iff G(q,ft) — (q',0 : ft). 

Fig. 9: ?5P5 : Exp -> RPPS. Fi S- 10: TPVS : Exp -> IPBS 



A. 13 Introspective reachability via Dyck state graphs 

Compiling an introspective pushdown system into a Dyck state graph for exception- 
flow analysis does not require special modification with repect of the iterative 
method: The function T : IPOS -» (BSG -> DSG) generates the monotonic 
iteration function we need: 

T{M) = /, where 
M=(Q,r,6,q ) 
f(S, r, E, s ) = (S', r, E', s Q ), where 

5" = S U js' : s e S and s h->> s'j U {s } 

E' = E U |s £ s' : s G S and s h-^w s'j . 

Our implementation of thefunction DSG correspondents exactly what's defined 
as above. In section IrTTl we will show details of computing Dyck state graph in 
the presence of exception flows. 



A. 14 Allocation: Poly variance, context-sensitivity and 
object-sensitivity 

In the abstract semantics, the abstract allocation functions take the form: allocFP : 
Stmt x Conf — 1 FramePointer and allocOP : Stmt x Conf ObjectPointer. The 
two allocation functions determine the polyvariance and object-sensitivity of the 
analysis. (In control-flow analysis, polyvariance literally referes to the number 
of abstract addresses (variants) there are for each variable.) All of the following 
allocation approaches can be used with abstract semantics: 

— Monovariance: Pushdown OCFA Pushdown OCFA passes the statement 
itself for abstract addresses, meaning that FramePointer will be passed the 



call site statement, and ObjectPointer the instantiation site statement: 

FramePointer = Stmt ObjectPointer = Stmt 

allocFP(s,c) = s allocOP(s, c) = s 

— Pushdown 1 CFA Pushdown 1CFA pairs the statement with current state- 
ment to get an abstract address: 

FramePointer = Stmt x Stmt 
allocFP(s,(s' ',fp,a,k)) = (s,s' ) 

ObjectPointer = Stmt x Stmt 
allocOP(s,(s',fp,a,k)) = (s,s' ) 

— Pushdown k-CFA Pushdown fc-CFA looks beyond the current state and 
at the last k states. By concatenating the statements in the last k states 
together, and pairing this sequence with a variable we get pushdown fc-CFA: 

FramePointer = Stmt x Stmt* 
allocFP(s,((s 1 Jp,a,k),...}) = (s, {s 10 , . . . s k o}) 

ObjectPointer = Stmt x Stmt* 
allocOP(s,((s 1 ,fp,a,k),. . .}) = (s, (s 10 , . . . s k0 )) 

In addition, there is much static context information after getting Abstract Syn- 
tax Tree(AST), such as for each statement, we can know its line number, what 
class and method it belongs to. By default, we also take advantage and instru- 
ment these information as complementary to the above context formalized. 

A. 15 System architecture 

We have implemented the analytic framework in Scala. Figure QT] presents the 
system architecture: apktool extracts .dex file from Android applications [2]. 
JDex2Sex extracts class files from the . dex file to generate an S-expression encod- 
ing the dex file. The S-expression IR is then fed into Dalvik Parser and parsed 
into a Dalvik AST. The Transformer takes another pass on the Dalvik AST to 
instrument push-handler statements and pop-handler pseudo-statements, and at- 
tach some other context information to statements. Preanalysis, specifically, live 
register analysis, is performed right after Transformer. It is an intra-procedural 
backward data flow analysis on instructions for each method pQ. 

The core pushdown analytic components starts from the second row in Fig [TTJ 
The implementation of each component follows its correspondent formulation: 
Stack-based CESK machine embodies the abstract state space as shown in FigO 



and abstraction transition relations in Section [5751 (l)PDCFA Machinery injects 
the program into a rooted pushdown system (Figure [TU]). A — gc flag deter- 
mines whether we use PDCFA Machinery or (l)PDCFA Machinery. Dyck State 
Graph Machinery implements the fixed-point synthesis algorithm (summarized 
in in Appendix IA.13|) . In the following section, we will focus on the details of 
summarization algorithm in this machinery in handling exception flows. 
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— ► 
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Preanalysis 

(Live Register Analysis) 



Stack-based CESK machinery (state space, transition rules) 
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Dyck State Graph Machinery 



Fig. 11: System Architecture of (i)pushdown exception flow analysis. (Lines with- 
out arrows indicates the components are implicitly connected) 



